Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts

Identifieur interne : 001D93 ( Main/Exploration ); précédent : 001D92; suivant : 001D94

An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts

Auteurs : Zainab Abu Bakar [Malaisie] ; Tengku Mohd T. Sembok [Malaisie] ; Mohammed Yusoff [Malaisie]

Source :

RBID : ISTEX:ACE6B72DB582152248EB43F31B20282866F2519D

Abstract

This article evaluates the effectiveness of spelling‐correction and string‐similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling‐correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic‐programming methods that measure longest common subsequence and editcost‐distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string‐similarity method. The best R&R and E results are given by using digram. Editcost‐distances produce the best E results, and both dynamic‐programming methods rank second in finding R&R mean measures.

Url:
DOI: 10.1002/(SICI)1097-4571(2000)51:8<691::AID-ASI20>3.0.CO;2-U


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts</title>
<author>
<name sortKey="Bakar, Zainab Abu" sort="Bakar, Zainab Abu" uniqKey="Bakar Z" first="Zainab Abu" last="Bakar">Zainab Abu Bakar</name>
</author>
<author>
<name sortKey="Sembok, Tengku Mohd T" sort="Sembok, Tengku Mohd T" uniqKey="Sembok T" first="Tengku Mohd T." last="Sembok">Tengku Mohd T. Sembok</name>
</author>
<author>
<name sortKey="Yusoff, Mohammed" sort="Yusoff, Mohammed" uniqKey="Yusoff M" first="Mohammed" last="Yusoff">Mohammed Yusoff</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:ACE6B72DB582152248EB43F31B20282866F2519D</idno>
<date when="2000" year="2000">2000</date>
<idno type="doi">10.1002/(SICI)1097-4571(2000)51:8<691::AID-ASI20>3.0.CO;2-U</idno>
<idno type="url">https://api.istex.fr/document/ACE6B72DB582152248EB43F31B20282866F2519D/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">002066</idno>
<idno type="wicri:Area/Istex/Curation">001F28</idno>
<idno type="wicri:Area/Istex/Checkpoint">001396</idno>
<idno type="wicri:doubleKey">0002-8231:2000:Bakar Z:an:evaluation:of</idno>
<idno type="wicri:Area/Main/Merge">001E93</idno>
<idno type="wicri:Area/Main/Curation">001D93</idno>
<idno type="wicri:Area/Main/Exploration">001D93</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts</title>
<author>
<name sortKey="Bakar, Zainab Abu" sort="Bakar, Zainab Abu" uniqKey="Bakar Z" first="Zainab Abu" last="Bakar">Zainab Abu Bakar</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Information Sciences and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi</wicri:regionArea>
<wicri:noRegion>43600 UKM Bangi</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Sembok, Tengku Mohd T" sort="Sembok, Tengku Mohd T" uniqKey="Sembok T" first="Tengku Mohd T." last="Sembok">Tengku Mohd T. Sembok</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Information Sciences and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi</wicri:regionArea>
<wicri:noRegion>43600 UKM Bangi</wicri:noRegion>
</affiliation>
</author>
<author>
<name sortKey="Yusoff, Mohammed" sort="Yusoff, Mohammed" uniqKey="Yusoff M" first="Mohammed" last="Yusoff">Mohammed Yusoff</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Malaisie</country>
<wicri:regionArea>Faculty of Information Sciences and Technology, Universiti Kebangsaan Malaysia, 43600 UKM Bangi</wicri:regionArea>
<wicri:noRegion>43600 UKM Bangi</wicri:noRegion>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="j">Journal of the American Society for Information Science</title>
<title level="j" type="abbrev">J. Am. Soc. Inf. Sci.</title>
<idno type="ISSN">0002-8231</idno>
<idno type="eISSN">1097-4571</idno>
<imprint>
<publisher>John Wiley & Sons, Inc.</publisher>
<pubPlace>New York</pubPlace>
<date type="published" when="2000">2000</date>
<biblScope unit="volume">51</biblScope>
<biblScope unit="issue">8</biblScope>
<biblScope unit="page" from="691">691</biblScope>
<biblScope unit="page" to="706">706</biblScope>
</imprint>
<idno type="ISSN">0002-8231</idno>
</series>
<idno type="istex">ACE6B72DB582152248EB43F31B20282866F2519D</idno>
<idno type="DOI">10.1002/(SICI)1097-4571(2000)51:8<691::AID-ASI20>3.0.CO;2-U</idno>
<idno type="ArticleID">ASI20</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0002-8231</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">This article evaluates the effectiveness of spelling‐correction and string‐similarity matching methods in retrieving similar words in a Malay dictionary associated with a set of query words. The spelling‐correction techniques used are SPEEDCOP, Soundex, Davidson, Phonix, and Hartlib. Two dynamic‐programming methods that measure longest common subsequence and editcost‐distance are used. Several search combinations of query and dictionary words are performed in the experiments, the best being one that stems both query and dictionary words using an existing Malay stemming algorithm. The retrieval effectiveness (E) and retrieved and relevant (R&R) mean measures are calculated from weighted combination of recall and precision values. Results from these experiments are then compared with available digram, a string‐similarity method. The best R&R and E results are given by using digram. Editcost‐distances produce the best E results, and both dynamic‐programming methods rank second in finding R&R mean measures.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Malaisie</li>
</country>
</list>
<tree>
<country name="Malaisie">
<noRegion>
<name sortKey="Bakar, Zainab Abu" sort="Bakar, Zainab Abu" uniqKey="Bakar Z" first="Zainab Abu" last="Bakar">Zainab Abu Bakar</name>
</noRegion>
<name sortKey="Sembok, Tengku Mohd T" sort="Sembok, Tengku Mohd T" uniqKey="Sembok T" first="Tengku Mohd T." last="Sembok">Tengku Mohd T. Sembok</name>
<name sortKey="Yusoff, Mohammed" sort="Yusoff, Mohammed" uniqKey="Yusoff M" first="Mohammed" last="Yusoff">Mohammed Yusoff</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 001D93 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 001D93 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:ACE6B72DB582152248EB43F31B20282866F2519D
   |texte=   An evaluation of retrieval effectiveness using spelling‐correction and string‐similarity matching methods on Malay texts
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024